Extracting macroscopic information from Web links
نویسنده
چکیده
Much has been written about the potential and pitfalls of macroscopic web-based link analysis, yet there have been no studies that have provided clear statistical evidence that any of the proposed calculations can produce results over large areas of the web that correlate with phenomena external to the Internet. This article attempts to provide such evidence through an evaluation of Ingwersen’s (1998) proposed external Web Impact Factor (WIF) for the original use of the web: the interlinking of academic research. In particular, it studies the case of the relationship between academic hyperlinks and research activity for universities in Britain, a country chosen for its variety of institutions and the existence of an official government rating exercise for research. After reviewing the numerous reasons why link counts may be unreliable, it demonstrates that four different WIFs do, in fact, correlate with the conventional academic research measures. The WIF delivering the greatest correlation with research rankings was the ratio of web pages with links pointing at research-based pages to faculty numbers. The scarcity of links to electronic academic papers in the data set suggests that, in contrast to citation analysis, this WIF is measuring the reputations of universities and their scholars, rather than the quality of their publications.
منابع مشابه
Presenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملTowards Intelligent Information Retrieval on Web
The World Wide Web is an information resource with virtually unlimited potential. However, this potential is relatively untapped because it is difficult for machines to process and integrate this information meaningfully and today the WWW links more than 15 billion pages. The retrieval of relevant information on web is an issue that is of main concern. As the internet grew and became popular, m...
متن کاملSemi-Structured File Analysis for Information Integration
This paper describes a PostScript file analyzer for extracting information from Web PostScript documents. Our motivation for studying this problem is the building of an informationintegration system. The information extracted from these semi-structured files can be used to model the contents of Web information sources and to define semantic links between items of information. Extracted informat...
متن کاملExtracting knowledge from the World Wide Web.
The World Wide Web provides a unprecedented opportunity to automatically analyze a large sample of interests and activity in the world. We discuss methods for extracting knowledge from the web by randomly sampling and analyzing hosts and pages, and by analyzing the link structure of the web and how links accumulate over time. A variety of interesting and valuable information can be extracted, s...
متن کاملAn Automated Algorithm for Extracting Website Skeleton
The huge amount of information available on the Web has attracted many research efforts into developing wrappers that extract data from webpages. However, as most of the systems for generating wrappers focus on extracting data at page-level, data extraction at site-level remains a manual or semiautomatic process. In this paper, we study the problem of extracting website skeleton, i.e. extractin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JASIST
دوره 52 شماره
صفحات -
تاریخ انتشار 2001